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Abstract: We study the notion of extensibility in functional data types, as a new 
approach to the problem of decorating abstract syntax trees with additional infor¬ 
mation. We observed the need for such extensibility while redesigning the data types 
representing Haskell abstract syntax inside Glasgow Haskell Compiler (GHC). 
Specifically, we describe a programming idiom that exploits type-level functions to 
allow a particular form of extensibility. The approach scales to support existentials 
and generalised algebraic data types, and we can use pattern synonyms to make it 
convenient in practice. 
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1 Introduction 

Back in the late 1970’s, David Turner’s inspirational work on SK-combinators 
[Turner, 1979b,a], and his languages Sasl [Turner, 1976], KRC [Turner, 1982], 
and Miranda [Turner, 1985], were hugely influential in the early development 
of functional programming. They introduced a generation of young computer 
scientists to the joy and beauty of functional programming, in a very direct and 
concrete way: elegant ideas; simple, perspicuous writing; and compelling inter¬ 
active implementations. David’s work has had sustained impact; for example, 
Miranda had a major influence on the design of Haskell [Hudak et al., 2007]. 

Algebraic Data Types (ADTs) and pattern matching are now firmly estab¬ 
lished as a core feature of any modern functional language. They first appeared 
as a usable feature in Hope [Burstall et al., 1980], and were rapidly adopted 
in ML [Milner, 1984], and in David Turner’s Miranda. ADTs make functional 
languages a fertile ground in which to define and process tree-like structures. 
However, trees often cannot grow, once a data type is defined and compiled, 



Najd S„ Jones S.P.: Trees that Grow 


43 


its definition cannot be extended by adding new data constructors, and/or by 
adding new fields to its existing data constructors. 

This lack of extensibility can be very painful. For example, at the centre of 
all compilers stand tall trees representing the abstract syntax of terms. Com¬ 
piler programs processing these trees often do so by decorating the trees with 
additional information. For instance, a name resolution phase adds information 
about names, and a type inference phase stores the inferred types in the rele¬ 
vant nodes. We refer to such extra information as decorations. The additional 
information may appear as new fields to the existing data constructors, and/or 
new data constructors in data types representing the trees. 

The compiler writer is then faced with two unpalatable choices. She can define 
a new data type representing the output decorated tree, at the cost of much 
duplication. Or she can write a single data type with all the necessary fields 
and constructors, at the cost of having many unused fields and constructors at 
different stages of compilation. 

This dilemma is very real. The Glasgow Haskell Compiler (GHC) has a single 
data type HsSyn that crosses several compiler phases; and a second entire data 
type TH.Syntax for Template Haskell. Moreover, some Haskell libraries, notably 
haskell-src-exts define yet another data type for Haskell source code. These 
data types are large (dozens of types, hundreds of constructors) and are very 
difficult to keep in sync. 

In this paper we offer a systematic programming idiom that resolves the 
dilemma, by providing a way to extend data types within Haskell. We leverage 
type-level openness to allow extensibility of term-level data constructors. 
Specifically, we make the following contributions 

— We describe a simple but powerful programming idiom that allows a data 
type to be extended both with extra constructor-specific fields and with 
extra constructors (Section 3). 

— We show that the idea can be extended to work for existentials and GADTs 
(Section 3.10). 

We discuss related work in Section 5 and conclude in Section 6. 

On a personal note, David’s papers and language implementations played 
a major role in drawing one of us (Simon) into the world of functional pro¬ 
gramming. My very first paper, Yacc in Sasl [Peyton Jones, 1985], was a parser 
generator for Sasl, and David acted as a mentor for me, at a time when I had 
no idea what programming language research was, or how to do it. Thank you 
David: I will be forever grateful for your encouragement and guidance in the 
launch phase of my professional life. 



44 


Najd S„ Jones S.P.: Trees that Grow 


2 The challenge 

In this section, we demonstrate the problem of decorating trees, and sketch some 
conventional ways to address it. 

2.1 Tree-Decoration Problem 

A compiler might need several variants of data types representing terms. For 
example: 

— We might want to label every node with its source location. 

— After name resolution we might want to decorate names in the tree with 
additional information, such as their namespace. 

— After type inference we might want to decorate some (but not all) construc¬ 
tors of the tree with inferred types. 

— The type checker might record type abstractions and applications that are 
not present in the source code. For this it would need to add new data 
constructors to the type — and for these constructors a source location 
might not make sense. 

One approach is to declare a completely new data type for each variant, but 
this is obviously unattractive because of the duplication it involves. In a real¬ 
istic setting, the abstract syntax for a source language might have tens of data 
types (expressions, patterns, guards, comprehensions, declarations, sequences, 
bindings, matches, etc etc), and hundreds of data constructors altogether. 

The Glasgow Haskell Compiler (GHC) makes an excellent (if incestuous) 
case study for the challenge of extensibility. In GHC, the syntax of Haskell, 
HsSyn, defines no fewer than 97 distinct data types with a total of 321 data 
constructors. It would be completely infeasible to define multiple variants of 
such a huge collection of types. Not only would it be terrible to duplicate the 
data structures, but we would also have to duplicate general functions like the 
pretty printer. 

2.2 So what does GHC do? 

Faced with this dilemma, what does GHC do in practice? It adopts a variety of 
strategies: 

— Straightforward parameterisation. The entire syntax is parameterised over 
the type of variables, so that we have 1 

1 These types are much simplified, but they convey the right idea for present purposes. 
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parse :: String HsExpr RdrName 
rename :: HsExpr RdrName —> HsExpr Name 
typecheck :: HsExpr Name —> HsExpr Id 

For example, the type checker replaces each Name with an Id; the latter is 
a Name decorated with a Type. 

— Extra data constructors. The data types include parts like 
data HsPat id = ... 

| ConPat id [Located (HsPat id)] 

| ConPatOut id ... other fields ... 

where the type checker replaces all uses of ConPat with ConPatOut. This is 
clearly unsatisfactory because the passes before the type checker will never 
meet a ConPatOut but there is no static guarantee of that fact. 

— Alternating data types. GHC needs to pin a source location on every source- 
syntax node (e.g., for reporting errors). It does so by alternating between two 
types. In the ConPat constructor above, the Located type is defined thus: 
data Located x = L SrcLoc x 

So there is a type-enforced alternation between HsPat and Located nodes. 
This idiom works quite well, but is often tiresome when traversing a tree 
because there are so many L nodes to skip over. 

— Phase-indexed fields. GHC uses the power of type families (Chakravarty et al. 
[2005]) to describe fields that are present only before or after a specific phase. 
For example, we see 

data HsExpr id = ... 

| ExplicitPArr (PostTc id Type) [LHsExpr id] 
where the PostTc type family is defined thus: 
type family PostTc id a 

type instance PostTc RdrName a = () 
type instance PostTc Name a = () 

type instance PostTc Id a = a 

This idiom makes use of the fact that HsSyn is parameterised on the type 
of identifiers, and that type makes a good proxy for the compiler phase. So 
the first field of an ExplicitPArr is () after parsing and after renaming, but 
is Type after type checking. 

All this works well enough for GHC, but it is very GHC-specific. Other tools want 
to parse and analyse Haskell source code define their own data types; the widely- 
used library haskell-src-exts is a good example. Even GHC defines a com¬ 
pletely separate data type for Template Haskell in Language.Haskell. TH.Syntax. 
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These data types rapidly get out of sync, and involve a great deal of duplicated 
effort. 

3 An idiom that supports data type extension 

We now introduce a programming idiom that allows data types to be extended 
in more systematic way than the ad-hoc tricks described above. 

We explain our solution with a running example. This paper is typeset from 
a literate Haskell source file using lhs2TeX [Hinze and Loh, 2015], and the code 
runs on GHC 8.0 using a set of well-established language extensions. 

{-# LANGUAGE TypeFamilies, DataKinds, ConstraintKinds #-} 

{-# LANGUAGE GADTs, EmptyCase, StandaloneDeriving #-} 

{-# LANGUAGE TypeOperators, PatternSynonyms #-} 

{-# LANGUAGE Flexiblelnstances, FlexibleContexts #-} 
import GHC .Types (Constraint ) 

3.1 Extensible ADT Declarations 

As a running example, consider the following language of simply-typed lambda 
terms with integer literals, and explicit type annotations: 

i e integers 

x,y e variables 

A, B, C e TYP::=Int | A -> B 

L,M,N e EXP ::=i\x \ M::A \ Xx. N \ L M 

In Haskell, the language above can be declared as the following data types: 

data Exp = Lit Integer type Var = String 

| Var Var ■ data Typ = Int 

| Ann Exp Typ : | Fun Typ Typ 

| A 6s Var Exp \ 

| App Exp Exp 


The data type Exp is not extensible. Our idea is to make it extensible like this: 
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data Exp x £ = Lit x (X Lit Q Integer 
| Var x ( X Var £) Var 
| Ann x ( X Ann £) ( Exp x £) Typ 
I Abs x (X Abs 0 Var ( Exp x C) 

I Appx (X App £) (Expx 0 (Expx C) 
I Expx ( X Exp 0 


type family X Llt £ 
type family Xy ar £, 
type family X Ann £ 
type family X Abs £ 
type family X App £ 
type family X Exp £ 


In this new data type declaration: 

- £ is a type index to Exp x • We call £ the extension descriptor , because it 
describes which extension is in use. For example Exp x TO might be a variant 
of Expx for the type checker for a language; we will see many examples 
shortly. 

- Each data constructor C has an extra field of type X c £, where X c is a type 
family, or type-level function [Chakravarty et al., 2005]. We can use this field 
to extend a data constructor with extra fields (Section 3.3). For example, if 
we define X Apv TO to be Typ, the App constructor of a tree of type Exp x TO 
will have a Typ field. 

- The data type has one extra data constructor Exp x , which has one field 
of type X Exp 0 We can use this field to extend the data type with new 
constructors (Section 3.4). 

Now, we can use the above extensible data type to define a completely un¬ 
decorated (UD) variant of Exp x as follows. 


type Exp 1,0 = Exp x UD 

data UD 

type instance Xj Jtt UD = Void 
type instance X Var UD = Void 


type instance X Ann UD = Void 
type instance X Abs UD = Void 
type instance X App UD = Void 
type instance X Exp UD = Void 


Since the non-decorated variant does not introduce any forms of extensions, 
all mappings are set to Void 2 which is declared (in Data. Void) like this: 


data Void 
void :: Void 

void = error "Attempt to evaluate void" 


absurd :: Void —> a 
absurd m = case m of { } 


That is, Void is a data type with no constructors, so it is inhabited only by 
bottom. 

2 ignoring the bottom type, () is used for empty field decorations, since for products 
(constructor fields) types ((), A) and ( A , ()) are isomorphic to A 
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With this instantiation, Exp x UD is almost exactly 3 isomorphic to the orig¬ 
inal data type Exp-, that is, there is a 1-1 correspondence between values of 
Expx UD and values of Exp. 

The alert reader may realise that the type instance declarations can all be 
omitted because, in the absence of such instances, UD is irreducible and 

hence is an empty type just like Void. But then there is no way to prevent clients 
of Exp x UD from accidentally adding an instance for Xauu UD, so we generally 
prefer to prevent that by giving an explicit instance. 

3.2 Pattern Synonyms for Convenience 

One can program directly over the new Exp x type, but it is a bit less convenient 
than it was with the original Exp data type: 

— When pattern matching, we must ignore the extra field in each constructor. 

— When constructing, we must supply void in the extra field. 

For example: 

incLit : : Exp —> Exp 
incLit (Lit i) = Lit (i + 1) 

incLit e = e 

incLitx :: Exp 00 —> Exp 170 

incLitx (Litx -i)= Litx void (i + 1) — Tiresome clutter 

incLitx e = e 

Solving this kind of inconvenience is exactly what pattern synonyms were 
invented for [Pickering et al., 2016]. We may define a pattern synonym thus 

pattern Lit UD :: Integer —y Exp 170 
pattern Lit 00 i i— Litx - * 

where LitF 0 i = Litx void i 

and similarly for all the other data constructors. This is a so-called bidirectional 
pattern synonym. In a pattern Lit UD i expands to Litx - i, while in a term 
Lit 1,0 i expands to Litx void i. So now we can write 

incLitx :: Exp 00 —> Exp 1/0 

incLitx ( Lit UD i) = Lit 00 ( i + 1) — No tiresome clutter 
incLitx e = e 

3 We say “almost exactly” because the term value Exp x void has no counterpart 
in Exp; alas Haskell lacks an entirely uninhabited type. We can simply hide the 
constructor Exp x from the client users to ameliorate this problem. 
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3.3 New Field Extensions 


Now, consider the following simple type system for our example language. 


r\-M :A 


_ {x: A) er r \- M : A 

r i : Int r\-x:A r\-M::A:A 


x : A, r \- N : B r t~ L : A -> B r h M : A 
r h Xx. N : A B r\~ LM :B 


Before type checking, often abstract syntax trees (ASTs) are processed by a 
type inference engine. The output of the type inference engine is the same input 
tree decorated with additional type information. Type inference helps users to 
leave certain bits of their programs without explicit type annotations. Type 
inference also helps in simplifying the type checker: after type inference, and 
decorating the trees with the additional type information, type checking becomes 
a straightforward syntax-directed recursive definition. To accommodate for the 
additional information in the output, we need larger trees, and hence we need 
to extend the original declarations. For instance, the following highlights the 
required changes to the non-extensible Exp data type: 


data Exp = ... | App Typ Exp Exp 


The definition is just like that of Exp, save for extending constructor App x 
with a new field of type Typ, as highlighted above. The duplication is unpleasant 
(particularly when the data type is much larger). 

In the extensible setting both non-decorated and decorated variants of Exp 
can be defined as extensions to the same base extensible data type Exp x . Fol¬ 
lowing the same approach as before, we can also define a decorated variant of 
Exp x suitable for type checking (TC) based on Exp x as follows. 


type Exp 70 = Exp x 1C 

data TC 

type instance X Lit TC = Void 
type instance Xy ar EC = Void 


type instance Xahv, EC = Void 
type instance X^bs EC = Void 
type instance Xa pp TC = Typ 
type instance X Exp TC = Void 


The difference (highlighted) is just that the App constructor gets an extra 
field of type Typ, just as required. 

The pattern synonyms for Exp can be defined as before, save for con¬ 
structor App x that takes an extra argument for the new field introduced by the 
extension, as highlighted below: 

pattern App 70 :: Typ —> Exp 70 -4 Exp 70 —> Exp 70 
pattern App 70 a l m = App x a l m 
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3.4 New Constructor Extensions 

We could as well consider scenarios where extended data types introduce new 
constructors. For instance, consider a simple partial evaluation (PE) pass over 
the trees, where /3-redices are normalised away in an input tree. After reducing a 
redex to a value, the partial evaluator stores the value as a relevant node inside 
the tree. To be able to decorate the tree with this new information (i.e., values), 
often new constructors should be introduced to the declarations. For instance, 
the following highlights the required changes (i.e., the new constructor Val) to 
the non-extensible Exp data type: 


data Val = ... 

data Exp = ... | Val Val 


We can still reuse our extensible data type Exp x to define a variant suitable for 
such partial evaluation (PE) by extending it with a new constructor Val PE as 


type Exp PE = Expx PE 

data PE 

type instance Xj Jlt PE = Void 
type instance X Var PE = Void 


type instance Xauu PE = Void 
type instance X^bs PE = Void 
type instance Xa vv PE = Void 
type instance Xp jX p PE = Val 


The pattern synonyms for Exp x PE can be defined as before, except that 
we introduce a new pattern synonym Val PE that represents the new constructor 
introduced by the extension, as highlighted below: 

pattern Val PE :: Val —> Exp PE 

pattern Val PE v = Exp x v 


3.5 Normal Functions on Extended Data Types 

Aided by the pattern synonyms, programming over the extended data type feels 
very much like programming over an ordinary non-extensible data type. For 
example, here is a type checker following the typing rules in Section 3.3: 


70 -t [{Var, Typ )] 


check :: Exp 
check ( Lit 70 
check ( Var 70 
check ( Ann 70 
check {Abs 70 
check ( App 70 
check - 


_) _ Int 

x) r c 

ma) Tc 
x n) r {Fun 
l m) Pc 


-i Typ —t Bool 
= True 

— maybe False {= c ) {lookup x T) 

= a = c A check mP c 
a b) = check n ((a;, a): T) b 

= check l r {Fun a c) A check m T a 
= False 
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One significant annoyance is that GHC is not yet clever enough to know when 
pattern synonyms are exhaustive, so the pattern-match exhaustiveness checker 
is effectively powerless. 


3.6 Generic Functions on Extensible Data Types 

We can sometimes exploit the common structure inherited from an extensible 
data type, to define generic functions acting uniformly over the extending data 
types. For instance, we can define a generic printer function once and reuse it 
for the extended data types. Let us begin with a simple printer that ignores 
the decorations introduced as new fields in the data type. For instance, such 
a printer works the same for both the undecorated data type Exp 170 and the 
decorated data type Exp 70 . Compilers often use such printers across multiple 
phases to print terms while reporting error messages. 

For the new constructor extensions, we can either ignore them like we do 
for the new fields, or use function parameters to handle printing of these new 
constructors. We choose to do the latter in the following example. 


printT :: Typ -4 String 
printT Int = "Int" 

printT (Fun a b) = " (" -H- printT a -H- ") -4 " -H- printT b 


printE 

printE 

printE 

printE 

printE 

printE 

printE 


:: ( Xexp £ -4 String) -4 Exp x £ -4 String 
- (Litx -i) = show i 

_ ( Var x ~x) = x 

p (Annx -tn a) = " (" -H- printE p m -H--H- printT 

p (Absx - x n) = "A" -H- x -H-"." -H- printE p n 

p (Appx -l m) == "(" +- printE p l -H- ") (" -H- printE p 

p ( Expx £) = P £ 


a -H-")" 


m-W -")" 


Above, we chose to pass explicitly the function parameters used for printing the 
possible new constructors. We could as well use type classes. 

Having defined the above generic printer, we can reuse it to define printers 
for extending data types Exp 00 , Exp 70 , and Exp*® as follows. 


printE 170 :: Exp 170 —> String 
printE 170 = printE absurd 
printE 70 :: Exp 70 -4 String 
printE 70 = printE absurd 


printE 7 ® :: Exp 7 ® “f String 
printE 7 ® = printE p 

where p v = "{{" -H- show v -H- "}}" 
deriving instance Show Val 


Since both Exp 00 and Exp 70 introduce no new constructors, the parameters 
passed to the generic function does plain matching on empty types. For Exp 7 ®, 
however, we pass a printer function handling values of the new constructor Val. 
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3.7 Type Classes for Extensible Data Types 

For the generic printer printE , we chose to ignore the new field extensions. We 
could as well make a variant that also prints the new field extensions. Such a 
printer is useful for debugging purposes. To implement such a printer for Exp x , 
as before, we need to provide five more function parameters to handle new field 
extensions in each of the five constructors. The type of generic function then 
becomes 

printE :: {X^u £ —> String) —> ( Xy ar £ —> String) (X Ann £ —t String) —> 

(■ X Abs £ -t String) 4 ( X App £ -4 String) 4 ( X Exp £ -4 Striru?) 
Exp x £ —> String 

Here, with this approach, genericity comes at the price of a long list of pa¬ 
rameters that need to be passed around. But this is exactly what Haskell’s type 
classes were designed to solve! We can instead write 4 

instance (Show (X Lit £),Show ( X Var £),Show (X Ann £)> 

Show {X Abs <4 Show ( X App 0, Show ( X Exp 0) => 

Show ( Exp x E) where 
show = ... 

and all the extra parameter passing becomes invisible. 

Using the constraint kinds extension in GHC, we can make the process of 
declaring such generic instances easier by abstracting over the constraint as 

type Forall x ((/)::*—> Constraint) £ 

= (0 (X LU 0^(Xvar 0,<t>{XAnn 0 

,<t>(x A bs £),</» (x Ap p 0,0 (x Exp 0) 

Hence by using above, the header of the instance declaration for Exp x be¬ 
comes as simple as 

instance Forall x Show f; => Show ( Exp x £) where 
show = ... 

In this case, and many others, we can even use Haskell’s automatic (standalone) 
instance deriving to implement the show method for us: 

deriving instance Forall x Show £ => Show ( Exp x !;) 

4 The type checker complains about the decidablity of type class resolution, because 
the constraints in the instance context are no smaller than those in the head. There¬ 
fore, we need to supply the compiler with a — XUndecidablelnstances flag, because 
we, as the programmers, know that the process is terminating for our use cases. 
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3.8 Replacing Constructors 

In some compiler passes, changes to trees can be beyond mere extensions. For 
instance, a pass may require the type of a field in a constructor to change. 
Consider the common pass in compilers where a chain of term applications are 
grouped together to form a saturated application where the term at the head 
of the chain is directly applied to a list of terms (often with //-expansions so 
that the size of the arguments list matches the arity of the head term). In our 
running example, to store the result of the saturation pass in a variant of Exp, 
we change the type of the arguments in constructor App to a list of terms: 

data Exp = ... | App Exp l-Erp] 

Such a change to the type of a field in a constructor, and in general changes 
to a constructor beyond what can be achieved by adding new fields (and smart 
use of pattern synonyms) can still be achieved following our idiom by replacing 
the constructor with a new one. 

The act of replacing a constructor can be seen as two distinct extensions: 
(a) adding a new constructor, (b) removing the old constructor. Removing a 
constructor is achieved by extending it with a new field of an empty type. As 
mentioned earlier, Haskell does not have such an empty type, as all types are 
inhabited by bottom, but we can achieve a similar result by not exposing the 
removed constructor to the client user (as a part of the interface of the extended 
data type). 

Assuming App x is not exposed to the client users, the following defines a vari¬ 
ant of Expx with fully saturated applications (SA), where the type of arguments 
in application terms is changed to a list of terms . 


type Exp 34 = Exp x SA 

data SA 

type instance X Lit SA = Void 
type instance Xy a r SA = Void 


type instance XA m SA = Void 
type instance X^bs SA = Void 
type instance Xa pp SA = Void 
type instance Xe xp SA 
= {Exp^, [Exp 8 *]) 


Now the new exposed application constructor can be defined by the following 
pattern synonym: 

pattern App 54 :: Exp 34 -/• [ Exp 134 ] ->• Exp 34 
pattern App 134 l ms = Exp x (l, ms) 
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3.9 Extensions Using Type Parameters 

The running example we considered so far has had no type parameters, besides 
the extension descriptor parameter that we have introduced. Many of the data 
types that we want to extend do have type parameters. For instance, consider a 
variant of Exp that is parametric at the type of variables: 

data Exp a = ... | Var a | Abs a (Exp a) 

Also consider a variant of the above with additional let expressions, as often 
introduced by passes such as let-insertion: 

data Exp a = ... | Let a (Exp a) (Exp a) 

Our idiom can also describe such an extension even though the extension (i.e., 
type variables in let bindings) is referring to the type parameter a. The general 
idea is to directly pass the type parameters in a constructor to the corresponding 
extension type functions: 


data Exp x C a 

= Lit x (X Lit £ | a ) 
I Var x (X Var C| a ) 
| Ann x ( X Ann £ pj) 
I Abs x (X Abs C|a ) 
I App x (X App C| a ) 
I Exp x (X Exp C « ) 


Integer 

Var 

{ E wx £ H) T yp 

Var ( Expx C H ) 

(Expxtm) {Expxi a ) 


type family X Lit C a 
type family Xy ar C m 
type family X Ann C Sji 
type family X Abs C a 
type family X App C a 
type family X Exp C a 


The extension introducing let expressions (LE) is defined same as before, 
this time with access to the type parameter: 


type Exp^ = Expx IE a 

data LE 

type instance Xj Jlt LE ol = Void 
type instance Xv ar LE a = Void 


type instance X Ann IE a | = Void 
type instance X Abs IE ^ = Void 
type instance X App IE ol = Void 
type instance X Exp IE a 
= (a, Exp^ a, Exp m a) 


Now, we can define a pattern synonym for the new constructor as before: 


pattern Let m :: a ^ Exp^ a -4 Exp^ a -4 Exp m a 

pattern Let 1 ^ x m n = Exp x (x, m, n) 


Similarly, we can support extensible data types with more than one type 
variable by passing them all to the extension type functions. 
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3.10 Existentials and GADTs 

So far, we have considered extensibility in normal algebraic data type declara¬ 
tions. However, in GHC, data may be defined by generalised algebraic data type 
(GADT) declarations. For instance, consider the following GADT declaration of 
a simple embedded domain-specific language of constants, application, addition, 
and Boolean conjunction, with one existential variable a in App: 


data Exp a where 


Con 

c —> Exp c 


App 

Exp {a —> b) —> Exp 

a —> Exp b 

Add 

Exp (Int —> Int 

Int) 

And 

Exp (Bool Bool — 

> Bool) 


We cannot print terms of this type: due to the polymorphic type of the field 
in Con, we need a printer for values of type a when printing Exp a, and since 
a is locally quantified in App and unavailable outside, we cannot supply such a 
printer from outside. We need to extend constructor App to store a printer for 
a right inside the constructor: 

data Exp a where 

App :: (a —> String) —> Exp (a —> b) —> Exp a —> Exp b 

Our idiom scales to generalised algebraic data types and supports extensibil¬ 
ity as above. To do so, we need to be able to access existential variables when 
defining extensions. As in the previous section, we can do so by simply pass¬ 
ing the existential types to the extension type functions as well. For the above 
example, we have the following extensible declaration: 

data Exp x Z a where 


Con x 

ApPx 

Xcon £ c 
X App Z a b 

-A Exp x Z c 

-t Exp x Z(a-+b)^> Exp x Z a Exp x Z b 

Add x 

x Add z 

—> Exp x Z (Int —> Int -t 

Int) 

And x 

• ^And £ 

—> Exp x Z (Bool —> Bool — 

Bool) 

Exp x 

: X Exp £ a 

-> E xp x Z a 



type family X Con Z c type family X And Z 

type family X App Z a b \ type family X Exp £ a 

type family X Add Z : 

We can now define a variant of Exp x , where App x is extended with a new 
field to store a printer (Pr) for the existential type a: 
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type Exp Pr a = Exp x Pr a type instance Xa,m Pr 

data Pr : type instance XAnd Pr 

type instance X Co n Pr c = Void ' type instance X Exp Pr 

type instance Xa pp Pr a b = (a —> String) 


Void 

Void 

Void 


As before, we can define pattern synonyms such as the following: 


pattern App** :: (a — > String) ->■ Exp** (a —t b) -t Exp** a -> Exp Pr b 
pattern App Pr p l m = App x p l m 


One other solution for writing such a printer is to constrain the indices of 
Exp with Show type class. This involves adding a local type constraint for the 
existential type a in AppE. Our idiom is also capable of expressing such exten¬ 
sions to the set of local type constraints. For this purpose, we need to introduce 
a proof data type Proof tj> a that matching on its constructor convinces GHC 
that the constraint (j> a is satisfied. So to define a variant of Exp x where the 
existential type is constrained with Show type class (Sh), we have the following. 


data Proof <j> a where 

Proof :: <f> a => Proof <j> a 
type Exp Sh a = Exp x Sh a 

data Sh 

type instance Icon Sh c 
type instance Xa pp Sh a b 

Now, we can define pattern synonyms such as following: 

pattern App Sh ::()=> Show a => Exp sh (a b) Exp Sh a —f Exp sh b 
pattern App Sh l m = App x Proof l m 

Similarly, we can add new locally quantified variables using a data type like 

data Exists f where 
Exists ::f a Exists f 


type instance XAdd Sh = Void 
; type instance XahcL Sh = Void 
— Void ' type i nstance ^Exp Sh a = Void 

= Proof Show a 


3.11 Variations on Theme 

Our idiom is just that: a programming idiom. For field extensions (Section 3.3), 
nothing requires us to add an extra field to every constructor, or to use a different 
type function for every constructor. Similarly if we do not want to extend the 
data type with new constructors we do not need to provide the extra data 
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constructor that supports such extension (Section 3.4). For example, here is a 
more specialised variant of our running example 

data Expx £ = Litx (XLeaf £) Integer 

I Var x ( X Leaf £) Var 

| Ann x (Expx £) Typ 

| Abs x {X AA £) Var ( Exp x £) 

I ApVx (X AA £) (Expx £) [Expx £) 

Here constructors Litx and Varx share a single extension-field type, XL ea f £; 
and similarly Absx and App x ; the constructor Ann x does not have an extension 
field; and we cannot add new data constructors. 

3.12 Shortcomings and Scope 

Our approach comes with a number of shortcomings. 

— Efficiency. Every constructor carries an extra extension field, whether or not 
it is used. 

— Exhaustiveness checks. Our use of pattern synonyms (which is optional, of 
course) defeats GHC’s current pattern-match exhaustiveness checker. And 
even if we did not use a pattern synonym, the extra constructor ( Exp x in 
our running example) will be flagged as unmatched even when we are not 
using it. Both are problems of engineering, rather than fundamental. 5 

— Boilerplate. When adding a new phase descriptor, there is a slightly uneasy 
choice between (a) adding lots of tiresome declarations 

type instance Xc £ = Void 

one for each constructor C whose extension field is not used, and (b) omitting 
the instance, and hoping that no one adds it later. 

Similarly, writing lots of pattern-synonym declarations can be painful. 

One alternative we have considered is to generate the boilerplate using Tem¬ 
plate Haskell, or even to define a new language extension. But it seems better 
first to gain more experience of using the idiom. 

Our idiom can naturally scale to support mutually recursive declarations by 
passing the same extension descriptor to all of the declarations. 

We have seen that our idiom is capable of expressing extensions to a gener¬ 
alised algebraic data type declaration such as adding new fields, adding new 

5 In fact, there are already partially implemented general features in GHC regarding 
both completeness of a set of pattern synonyms, and improving the totality checker 
to recognise absurdity. 


type family X Leaf £ 
type family X AA £ 
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constructors, adding new local constraints, and adding new existential variables. 
We have also seen that, we can replace constructors, and access global and local 
type variables in our extensions. 

In addition to these changes, we can combine our idiom with pattern syn¬ 
onyms and module system features to express other changes like 

— change to the order of fields, such as 
pattern (®) :: Exp — > Exp —t Exp 
pattern m © l = App l m 

— removing fields, such as 
pattern K :: Exp —f Exp 
pattern K n <— Abs _ n 

where K n = Abs n 

— fixing values of fields, such as 
pattern One :: Exp 
pattern One = Lit 1 

Yet, there are other possible forms of changes to a data type declaration, like 
adding new type variables. In the next section, we take a few steps further. 

4 Extension Descriptors 

So far in our examples, the extension description parameters have been empty 
types used as indices to define extensions. However, extension descriptors are 
themselves ordinary algebraic data types, and in this section we study extensi¬ 
bility using more complex extension descriptors. 

4.1 New Type Parameter Extensions 

Suppose we wanted to add a new field of type a to some or all of the data 
constructors in the type. Then we would need to add a as a parameter of the 
data type itself. Can we do that? 

In our example, suppose we wanted to add a source location to every node 
in the tree. Source location decorations associated with a node may appear 
as new fields, or as new constructors wrapping nodes with a source location. 
Strictly speaking, the latter approach is less precise compared to the former: 
such wrapper constructors can be applied to a node more than once, or not ap¬ 
plied at all. With the former, the programmer is in control: using the optional 
type (e.g., Maybe) of source locations in decorations models the optional appli¬ 
cation of wrapper constructors, and using the list of source locations models the 
multiple applications of wrapper constructors to a node. Regardless of the dec¬ 
oration approach, the type of source locations (annotations in general) is often 
kept polymorphic, allowing programmers to define generic functions like fmap, 
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fold, and traverse. A good example is the AST in Haskell-Src-Exts, where the 
polymorphic annotations are used for different purposes, including the source 
locations used in an exact printer. In our extensible setting, support for poly¬ 
morphic source locations amounts to (1) extending the AST declarations with 
a new type parameter a (the type of source locations) and (2) extending all the 
constructors with a new field of type a. To do so, we need the ability to extend 
an ADT data type declaration with a set of type variables, and to access these 
variables to define extensions, such as new fields. Our encoding is capable of ex¬ 
pressing such new type parameter extensions: the idea is to carry the extra type 
parameters in the extension descriptors. For instance, the following defines an 
extension to Exp x with a new type variable a, and uses it to define polymorphic 
annotations An as new field extensions. 


type Exp n |a = Exp x (An a ) 

data An fa 

type instance Xlu (An g§) = a 
type instance X Var (An ®) = a 


type instance XAnn (An |Q) = a 
type instance XAbs (An jyj) = |§| 
type instance Xa vv (An |§|) = a 
type instance Xe xp (An |g|) = Void 


Notice that we made the definition of the extension descriptor parametric, 
and then we could access the parameter when defining extensions. 


4.2 Hierarchy of Extension Descriptors 

In practice, compilers may have multiple variants of an AST, many of which are 
closely related to each other. For instance in GHC, the AST in the front-end 
of the compiler, named HsSyn, has three major variations used in the parsing, 
renaming, and type-checking passes. GHC also has an entirely separate variant as 
a part of its metaprogramming mechanism Template Haskell. The first three are 
closely related, while the last is quite different. We can organise such variations 
by putting them in hierarchies of indices and use this hierarchy when defining 
extensions. For instance, for GHC, we may define the extension descriptor as 
data GHC (c :: Component) 

data Component = Compiler Pass \ TemplateHaskell 
data Pass = Parser | Renamer \ TypeChecker 

Having the above as a hierarchy of extension descriptors, we get the four vari¬ 
ations of HsSyn AST in the extensible setting. For instance, the type checker 
AST would be of the type HsSyn (Compiler TypeChecker). 

It also allows us to define generic extension descriptors such as 
type family PostTC p where 
PostTC TypeChecker = Typ 
PostTC _ = Void 

type instance Xa vv (GHC TemplateHaskell) = Void 
type instance Xa pp (GHC (Compiler p)) = PostTC p 
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5 Discussion and Related Works 

The problem of extensibility in data types is a hot topic in computer science. 
There are many different approach to this problem. To name a few: struc¬ 
tural and nominal subtyping, extensible records and variants, and numerous 
approaches to Wadler’s expression problem. There are too many solutions to 
mention; the reader may consult the references in [Torgersen, 2004; Axelsson, 
2012; Swierstra, 2008; Lindley and Cheney, 2012; Bahr and Hvitved, 2011; Loh 
and Hinze, 2006], for some examples. However, surprisingly, our problem, and 
hence our solution, has unique distinguishing characteristics: 

Need for both directions of data extensibility: We need, and provide, ex¬ 
tensibility on two major directions of data extensibility: adding new fields to 
existing constructors, and adding new constructors. The so-called “expres¬ 
sion problem” and its solutions are often only concerned with the latter form 
of extensibility. 

Generic programming is a plus, not a must: Our primary goal is to re¬ 
use the data type declaration itself, rather than to re-use existing functions 
that operate over that type. For example, in GHC, the parser produces 
HsSyn RdrName, the renamer consumes that and produces HsSyn Name, 
which the type checker consumes to produce HsSyn Id. All three passes 
are monomorphic: they consume and produce a single fixed variant of the 
underlying data type. 

In contrast, work addressing the expression problem is often concerned with 
re-usability and compositionality of functions defined per cases. 

As we have seen with some examples (e.g., the generic printer), one can 
write and reuse functions that are polymorphic in the extension descriptor, 
but only by (a) simply discarding or preserving the decorations, or (b) using 
auxiliary higher order functions to process the decorations. If one wishes to 
take functions written only for a specific variant of a data type and reuse 
them, as an after-thought, for other variants, certain forms of static guaran¬ 
tees (possibly, beyond what types currently provide) are required for safety. 
One common practice here is to focus on certain subclass of data types. 

Trees are declared: In our setting, trees are often declared, rather than them 
being anonymous. There are well-known trade-offs between declared and 
anonymous data structures. The former is simpler and less error-prone, and 
the latter enables more opportunities for generic programming. Row poly¬ 
morphism, and the similar, often infer the structure of data from their uses, 
leading to large types, bad performance, and complicated error messages. 
Our approach is based on declaring both extensible and extended data types 




Najd S„ Jones S.P.: Trees that Grow 


61 


(by describing the exact extensions). It resembles the long lasting problem of 
supporting anonymous records, such as Trex [Gaster, 1998], in GHC, where 
solutions with declared flavour often dodge the problem by leaving program¬ 
mers to do some of the work by providing more information. 

Similar to our idiom in spirit is McBride’s Ornaments [Dagand and McBride, 
2014; Williams et al., 2014]. The key idea of ornaments is to declare trans¬ 
formations of data types that preserve the recursive structures of data types, 
with focus on reusing functions defined on the original for the transformed 
data types. While our idiom can benefit from works on ornaments for such 
reuse, there are decorations in practice that do not preserve the recursive 
structures. For instance, in GHC, for better or worse, the constructor rep¬ 
resenting if-expressions (like some others) is decorated with one additional 
expression to store user-defined macros rebinding if-syntax, hence not pre¬ 
serving the recursive structure. 

Works with the current technology: Existing solutions often demand 

changes to the compiler. Some other, come at the price of losing certain 
desirable properties, such as decidablity of type inference, or predictability 
of the performance. In contrast, our solution works in GHC right now (v8.0). 

6 Conclusion 

In the 1980s we were mainly concerned with functional programming over terms , 
but this paper has mainly focused on functional programming over types, with 
the interesting new twist that type functions (unlike term functions) can be 
open. We have explored how to leverage that type-level openness to allow exten¬ 
sibility of term-level data constructors. David, we hope that you approve. Happy 
birthday! 
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